Compiling XSLT 2.0 into XQuery 1.0 


Achille Fokoue, Kristoffer Rose, Jér6me Siméon, Lionel Villard 
IBM T.J. Watson Research Center 
P.O.Box 704, Yorktown Heights 
NY 10598, USA 


{achille, krisrose, simeon, villard}@us -ibm.com 


ABSTRACT 


As XQuery is gathering momentum as the standard query language 
for XML, there is a growing interest in using it as an integral part 
of the XML application development infrastructure. In that context, 
one question which is often raised is how well XQuery interoper- 
ates with other XML languages, and notably with XSLT. XQuery 
1.0 [16] and XSLT 2.0 [7] share a lot in common: they share 
XPath 2.0 as a common sub-language and have the same expres- 
siveness. However, they are based on fairly different programming 
paradigms. While XSLT has adopted a highly declarative template 
based approach, XQuery relies on a simpler, and more operational, 
functional approach. 

In this paper, we present an approach to compile XSLT 2.0 into 
XQuery 1.0, and a working implementation of that approach. The 
compilation rules explain how XSLT’s template-based approach 
can be implemented using the functional approach of XQuery and 
underpins the tight connection between the two languages. The 
resulting compiler can be used to migrate a XSLT code base to 
XQuery, or to enable the use of XQuery runtimes (e.g., as will soon 
be provided by most relational database management systems) for 
XSLT users. We also identify a number of areas where compati- 
bility between the two languages could be improved. Finally, we 
show experiments on actual XSLT stylesheets, demonstrating the 
applicability of the approach in practice. 


Categories and Subject Descriptors 


D.2 [Software]: Software Engineering 


General Terms 


Languages, Standardization 
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1. INTRODUCTION 


As XQuery 1.0 [3] gets closer to recommendation, developers 
are starting to consider it as a viable alternative platform for XML 
application development. As a result, the question of how XQuery 
fits with the existing XML infrastructure becomes a crucial one. In 
particular, how to use XQuery together with existing XSLT-based 
applications is often a crucial question. In this paper we describe 
an approach to compile XSLT transformations into XQuery, and an 
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implementation based on that approach. This provides a practical 
solution for using XQuery and XSLT jointly in a way that is both 
effective and efficient. 

Despite of their similarities, understanding the precise relation- 
ship between XSLT and XQuery is not as easy as it seems. On 
the one hand, XSLT 2.0 [7] and XQuery 1.0 [3] share many char- 
acteristics. Both have XPath 2.0 [1] as a subset and are based on 
a common data model [5]. Both are functional languages with- 
out side-effects, and both are Turing-complete. On the other hand, 
XSLT and XQuery are based on fairly different designs. XSLT re- 
lies on a highly declarative template-based approach which gives 
the ability to easily extend existing programs or merge programs 
together. XQuery is based on a purely functional approach, which 
gives more direct control to the user but is somewhat more opera- 
tional. 

Since they have the same expressive power, one could argue that 
either XSLT or XQuery could be used for any given application. 
Another option would be to rely solely on the fact that XQuery and 
XSLT share a common data model. However, experience suggests 
a need for tighter coupling between those technologies. First of all, 
even if the languages target two different user communities, mod- 
ern applications will increasingly require expertise from both. In 
addition, certain applications are more easily written with one lan- 
guage or the other. For instance, joins are very naturally expressed 
using XQuery’s “FLWOR’” expressions, while XML to HTML con- 
version is still often easier to write using XSLT’s template-based 
approach. Finally, some popular systems will support only one of 
those two languages, but not the other. For instance, all popular 
database management systems [4] are planning to support XQuery, 
but not always XSLT, while some popular editors and libraries sup- 
port XSLT but not XQuery. For all those reasons, there is a strong 
need to develop technology which can provide a tight coupling be- 
tween the two languages. 

The main contribution of this paper is an approach to compile 
XSLT 2.0 stylesheets into XQuery 1.0, which provides the foun- 
dations for a tight coupling between the languages. The compiler 
covers almost the complete XSLT 2.0 language, and we provide 
experiments with our current implementation that show that the ap- 
proach is practical and effective. Because of space limitations, we 
concentrate on explaining the compilation rules for the core of the 
compiler, notably how to compile XSLT’s template based approach 
to XQuery’s functional approach. We also identify key problems in 
making the compiler complete, which often relate to specific se- 
mantic incompatibilities between the two languages. For most of 
those problems, practical solutions are proposed and have been im- 
plemented. 

One of the strengths of the proposed approach is that the result- 
ing compiler can be used for a variety of practical needs. It can be 


used by XQuery developers who may want to migrate an existing 
code base to XQuery. It can be used by XSLT developers who may 
want to write applications in XSLT, while running those applica- 
tions on top of an existing XQuery run-time, such as provided by 
relational database systems. It can also be used as a component in 
providing a common XQuery-XSLT infrastructure, which in turn 
can be used to enable the development of common optimizations, 
as well as the ability to call templates from XQuery expressions, or 
vice versa. 
The key technical contributions in the paper are: 


e We provide detailed compilation rules from the template- 
based approach of XSLT 2.0 into XQuery’s functional ap- 
proach. The rules are designed in order to provide the most 
natural compilation, so that the resulting program can easily 
be understood by an experienced XQuery programmer. 


e Covering the complete XSLT language is difficult due to its 
size and complexity. We identify the fragments of the lan- 
guage that are the most challenging to compile into XQuery, 
and provide some corresponding solutions. In some cases, 
we identify concrete locations for which the alignment be- 
tween XSLT 2.0 and XQuery 1.0 could be improved. 


e This approach has been implemented in a running proto- 
type. We describe the architecture of that prototype and pro- 
vide experiments which demonstrate the feasibility of the ap- 
proach. Our current prototype runs a very large fragment of 
a full set of XSLT conformance tests, and has been tested on 
a number of non-trivial stylesheets. 


The paper is organized as follows. In section 2, we illustrate the 
compilation approach on a simple example. In Section 3, we give 
the translation rules for the heart of the compiler. In Section 4, we 
focus on the most complex detailed issues that must be addressed 
to support the complete language. We describe the implementation 
of our XSLT to XQuery compiler and present experiment results 
in Section 5. Finally we conclude and give some perspectives in 
Section 7. 


2. APPROACH AND EXAMPLE 


In this section, we illustrate our approach by describing the com- 
pilation of a simple XSLT stylesheet into XQuery, and use that ex- 
ample to explain some of the key technical challenges and how to 
address them. 


2.1 The recipe example 


Figure 1 shows a simple recipe stylesheet inspired by the Sarvega 
XSLT benchmark [11]. This stylesheet formats a single recipe 
XML document to HTML. 

An XSLT stylesheet is composed of templates. Each template 
associates a pattern that matches against certain nodes to the eval- 
uation of an expression. When the node currently being processed 
matches a given match pattern, its associated template is evaluated 
to create a fragment of the output document. For instance, the very 
first rule in Figure 1 matches a recipe element and creates an html 
element with a body within it. The content of the body element is 
then composed of: a h1 header, which is obtained by applying the 
templates to the children title elements within the recipe, a list 
of ingredients, and the description for the preparation. The rest of 
the stylesheet contains the remaining templates for the other ele- 
ments within a recipe. 

The template based approach of XSLT is similar to a functional 
approach in the sense that templates operate without side effects 
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<xsl:stylesheet> 
<xsl:template match="recipe"> 
<html> 
<body> 
<h1l><xsl:apply-templates select="title"/></h1> 
<ul><xsl:apply-templates select="ingredient"/></ul> 
<xsl:apply-templates select="preparation"/> 
</body> 
</html> 
</xsl:template> 
<xsl:template match="ingredient"> 
<xsl:param name="num" select="count (ingredient) "/> 
<li><xsl:value-of select="@name"/></1li> 
<ul> 
<xsl:apply-templates 
select="ingredient [position() le $num] "> 
<xsl:with-param name="num" select="$num - 1"/> 
</xsl:apply-templates></ul> 
<xsl:apply-templates select="preparation"/> 
</xsl:template> 
<xsl:template match="preparation"> 
<ol><xsl:apply-templates select="step"/></ol> 
</xsl:template> 
<xsl:template match="step"> 
<li><xsl:value-of select="text () |node()"/></1i> 
</xsl:template> 
</xsl:stylesheet> 


Figure 1: Recipes stylesheet. 


over their input parameters. However, there are also some impor- 
tant differences. Notably, templates are not called explicitly within 
the stylesheet, instead, the xsl: apply-templates expression is 
applying the whole set of templates on the selected nodes. The 
actual template being triggered is decided using a set of rules spec- 
ified as part of the semantics of XSLT. In case of conflicts, XSLT 
provides a resolution mechanism based on template priority that al- 
ways selects a unique template. On top of this built-in semantics, 
the user can partially control how the templates are triggered using 
the notion of mode. It can associate a given template to a mode, 
and calls the xsl: apply-templates expression with a particular 
mode. 


2.2 Compilation approach 


The close relationship between XSLT and XQuery makes some 
of the compilation easy. Notably XQuery and XSLT share XPath 
2.0 as a subset.’ In addition, XPath 2.0 expressions are used only in 
specific locations within a stylesheet, which facilitates their iden- 
tification during compilation into XQuery. Note that the reverse 
translation would be more difficult because XQuery can arbitrarily 
compose XPath expressions with other kinds of expressions. On 
first approximation, compiling an XPath 2.0 expression to XQuery 
1.0 is essentially applying the identity function. As we will see, 
this is not entirely true, since some care is needed to make sure 
the resulting XPath expression will operate over the proper input 
context. Nonetheless, the principle applies, which facilitates the 
translation, and makes the resulting XQuery easier to read and edit 
for a programmer. 

Dealing with the rule-based execution model of XSLT is the 
main challenge that must be tackled when compiling stylesheets 
to XQuery. First, although xsl:apply-templates may resem- 
ble a function call, its semantics does not correspond to explicit 
function calls, but instead relies on a kind of dynamic dispatch 
based on pattern matching, template priority, import precedence, 
and modes. Second, the notions of pattern matching and implicit 
context item at each point of the evaluation of a stylesheet do not 
exist in XQuery. Third, template parameters, as opposed to XQuery 


"Note that we do not consider here the compilation of XSLT 1.0, 
which would require the treatment of backward compatibility is- 
sues with XPath 1.0 [1]. 


function parameters , may be optional. In this section, we focus 
on how our compilation addresses these three issues by translat- 
ing the xsl:template and xsl: apply-templates instructions 
in the example of Figure 1. 

Fortunately, with the proper care, the template-based approach 
of XSLT can be implemented using XQuery user-defined functions. 
The main idea here is to create an explicit function for each tem- 
plate, and to replace each xsl:apply-templates instruction by 
an XQuery function call to the generated XQuery function perform- 
ing the proper explicit dynamic dispatch. 

For each kind of XSLT components, we apply the following 
compilation principles: 


e XQuery variables are used to model the XSLT context. 


e Relative XPath expressions that implicitly depend on the cur- 
rent context item, position or size are translated into equiva- 
lent absolute expressions (prefix by either a function call or 
a variable) that do not depend on the implicit context. 


e XSLT match patterns are translated into an equivalent com- 
bination of standard XPath expressions with conditionals. 


e XSLT templates definitions are compiled into XQuery user- 
defined functions. 


e xsl:apply-templates are compiled into function calls to 
a generated XQuery function which consists of a combina- 
tion of XQuery’s conditional expressions to model XSLT’s 
dynamic dispatch, and calls to the appropriate XQuery func- 
tion for the corresponding templates. 


2.3 Step by step translation 


In the rest of the section, we illustrate each of those principles on 
concrete examples extracted from the recipe stylesheet. In what 
follows, we will use the namespace prefix t2q for variables and 
functions used by our compiler. 


Context and relative path expressions 


XSLT uses a notion of context to implicitly pass parameters be- 
tween templates during the evaluation. XQuery also supports a no- 
tion of context. However, that context cannot be bound explicitly. 
In order to deal with that issue, and also avoid possible wrong inter- 
action between the XSLT context and the XQuery context, we use 
explicit variables to model the XSLT context. Those variables are 
$t2q:dot for the context item, $t2q:pos for the context position, 
and $t2q: last for the context size. 

Each relative path expression within the original stylesheet must 
be prefixed by the appropriate bindings to the context variables. For 
example, the relative path expression description is translated 
into $t2q:dot/description. How the input context is passed 
to the path expression must pay attention to the actual way that 
expression is constructed. For instance, the translation for the ex- 
pression count (ingredient/ingredient) is the slightly more 
complex: 


count ($t2q:dot/ingredient/ingredient) 
Here, the input parameter is passed on the path within the function 
call. 
Match patterns 


The notion of match pattern does not exist in XQuery. Therefore 
the compiler translates match patterns into equivalent XPath ex- 
pressions by reversing the pattern. A node matches a pattern if it 
belongs to the list of nodes that this pattern can select. 


684 


For instance, the match pattern recipe is translated into the path 
expression exist (self::recipe), which returns true iff the in- 
put node is an element recipe. 

One subtlety is that to obtain the right semantics without nega- 
tively impacting performance, the patterns need to be reversed. For 
instance, a pattern: recipe/title has to be reversed into a path 
expression of the following form: 


exist (self::title[parent::recipe]) 
The more complex XPath expression 
people/person[@name="John Doe"]//phone 


is reversed into 


exist (self: :phone[ 
ancestor: :person[@name="John Doe"]/ 


parent: :people]) 


The detailed translation of patterns can be somewhat involved in 
some cases. Attribute patterns must not be translated into an at- 
tribute axis as it would not match an input node of type attribute, 
but return attributes of that input note. Using the self axis would 
not work either, since it would only select the current node if it is 
an element. Therefore, @name is translated into the more complex 


exist((.)[. instance of attribute ("name") ]) 


Similarly, the pattern @« : name is translated into the more explicit 


exist((.)[(. instance of attribute() ) 


and (local-name(.) eq "name") ]) 


Finally, special attention must be paid to the translation of patterns 
containing steps containing position predicates. For example, 


people/person[2]/address 


is matched by the address of the second person. The simple trans- 
lation 


self::address[parent::person[2]/parent::people] 


would be wrong, as it would not match any elements (because 
parent: :person[2] would not select any elements). A correct 
translation must recover the position by going up then down the 
tree, as follows: 


exist (self::address[parent::person[ 


parent: :node()/person[2]=.]/parent::people]) 


Templates 


Templates are translated into equivalent XQuery functions. The 
signature of these functions includes the context node, the context 
position, the context size and the list of parameters declared in the 
template. For example, the following template 


<xsl:template match="collection/description"> 
<xsl:value-of select="text ()"/> 
</xsl:template> 


is translated to 


declare function t2q:templateli ( 
St2q:dot as node(), 
St2q:pos as xs:integer, 
St2q:last as xs:integer, 
St2q:mode as xs:string) 


{ 
(text {string-—join ( 
for $t2q:d in data($t2q:dot/child::text () ) 
return ($t2q:d cast as xs:string),’ ')}) 
} 


Template application 


Dealing with xsl:apply-templates is the most complex part 
of the translation. The evaluation of the xsl:apply-templates 
instructions consists of first evaluating the XPath selection asso- 
ciated to it and second looking for a template that matches the 
selected nodes. All templates with the same mode attached to 
the xsl: apply-templates instruction are considered. Whenever 
several templates match the same node, then the winner is the one 
with the highest priority. Basically, the translation consists of the 
following steps: 


e Definition of the XQuery applyTemplates function which 
implement the processing described above, i.e., looking for 
the correct template function to call. 


e The translation of each xsl:apply-templates instruction 
is an XQuery function call to a generic applyTemplates 
function, this for each node selected by the XPath selection 
associated with the xs1:apply-templates instruction. 


The applyTemplates function can be broken down in two main 
pieces: template ordering and parameter binding. We first describe 
these two pieces and then we put them together. 


Dealing with priority 


The search for the template to instantiate depends on the template’s 
priority and mode. Templates are ordered according to their import 
precedence and priority. The latter is either specified by the user 
through the priority attribute on the template or computed by 
analyzing the syntax of the template’s pattern [7, §6.4]. Templates 
are picked up according first to their import precedence and then 
their priority, both statically known. 


Dealing with parameters binding 


An important mismatch between XQuery function calls and XSLT’s 
xsl:apply-templates is that the latter can be called with im- 
plicit parameters. For example, the evaluation of the instruction 


<xsl:apply-templates select="ingredient"/> 


may pass the default value of parameter "num" implicitly to the 
evaluation of the "ingredient" template. Default function pa- 
rameters do not exist in XQuery. Therefore, when invoking the 
generated XQuery applyTemplates function, parameters must be 
fully bound, either by using the value specified in connection with 
the xsl:apply-templates instruction (via xsl:with-param) 
or by using the special generated variable SUNDEFINED to indicate 
that the default value of the parameter should be used. For example 
<xsl:apply-templates select="ingredient"/> 
without an explicit parameter in the recipe template is translated to 
let S$t2q:sequence := S$t2q:dot/child::ingredient 
return 
let S$t2q:last := 
return 
for $t2q:dot at $t2q:pos in S$t2q:sequence 
return 
t2q:applyTemplates (St2q:dot, $t2q:pos, 
St2q:last,’#default’, 
$t2q:UNDEFINED) 


count ($t2q:sequence) 
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whereas 


<xsl:apply-templates 
select="ingredient [position() le $num] "> 
<xsl:with-param name="num" select="Snum - 1"/> 
</xsl:apply-templates> 


in the ingredient template is translated to 


let S$t2q:sequence 
:= $t2q:dot/child::ingredient [position () 
return 
let St2q:last := 
return 
for $t2q:dot at $t2q:pos in S$t2q:sequence 
return 
t2q:applyTemplates (St2q:dot, $t2q:pos, 
St2q:last,’#default’, 
$num = 1) 


le $num] 


count ($t2q: sequence) 


Dealing with xs1:apply-templates 


In addition to context information, the signature of the generated 
XQuery applyTemplates function has as many parameters as 
there are template parameters with distinct names. The position of 
each of these additional parameters uniquely identify the template 
parameter name it represents. Thus the applyTemplates func- 
tion has all information required to call the appropriate template 
function with all its parameters bound. The following generated 
XQuery fragment illustrates this: 


declare function t2q:applyTemplates ( 
St2q:dot as node(), 
St2q:pos as xs:integer, 
St2q:last as xs:integer, 
St2q:mode as xs:string, 
St2q:param0) 


Cah, Sake) 
t2q:template3 ( 
$t2q:dot, 
St2q:pos, 
St2q:last, 
$t2q:mode, 

typeswitch ($t2q:param0) 

case $t2q:a as comment() return ( 
if (($t2q:a is $t2q:UNDEFINED) ) 

then count (($t2q:dot/child::ingredient) ) 
else $t2q:param0) 
default return $t2q:param0) 
(: :) 
} 


Notice the test needed to figure out whether the default parameter 
value should be used. 

Finally we can outline the function applyTemplates, which is 
defined as follows (in pseudo-code): 


declare function applyTemplates ( 
Sdot as node()?, 
$pos as xs:integer, 
Slast as xs:integer, 
Smode as xs:string, 
Sparaml, 
Dasi 
$paramN) 
{ 
if ($mode = mode template 1 
and fn:exists (select template 1)) 
then templatel 
($dot, $pos, $last, 
typeswitch (Sparam1) 


Smode, 


case $a as comment() return 
if (Sa is SUNDEFINED) 
then default value param 1 template 1 
else Sparaml 
default return $t2q:paraml, 
typeswitch (SparamN) 
case $a as comment() return 
if (Sa is SUNDEFINED) 
then default value param N template N 
else SparamN 
default return $paramN 
) (: end of templatel function call :) 


else if (S$mode = mode template N 
and exists(select template N)) 


then templaten(S$dot, S$pos, Slast, $mode, ...) 

else 

builtInApplyTemplates(Sdot, Spos, Slast, $mode, 
Sparaml,..., SparamN) 


} 


where buildInApplyTemplates is a function that calls XSLT 
built-in templates. The applyTemplates function takes as para- 
meters the current context node, the current context position, the 
current context size and a list of N parameters of type item() x», 
namely $paraml,..., $paramN where N is the number of distinct 
names of parameters defined by templates of the style sheet. All 
parameters names of a template are thus mapped into positional 
names, from | to N. 


3. FROM RULE-BASED EXECUTION TO 
XQUERY FUNCTIONS 


At the heart of our compiler is the ability to translate the rule- 
based execution style of XSLT into the “pure” functional XQuery 
approach. In this section, we formally present three kinds of trans- 
lation rules that achieve this goal, following the approach described 
in Section 2. First, templates are mapped into XQuery function de- 
finitions. The second kind of translation rules, called XQuery Ap- 
plicator Function Generators (XAFG), generates XQuery functions 
that encode, in XQuery, all the implicit rules for template selection, 
execution and conflict resolution. Finally, another set of translation 
rules describes how XSLT applicators (xsl:apply-templates, 
xsl:apply-imports and xsl:next-—match) are converted into 
XQuery by invoking XQuery functions generated by the XAFG 
translation rules. 


Notations. In this section, we formally describe the compilation 
from XSLT to XQuery with a set of translation rules, in the style of 
the XPath 2.0 and XQuery 1.0 Formal Semantics. Each translation 
rule takes part of an XSLT 2.0 stylesheet as input, and produces part 
of an XQuery expression as output. We use the following notations 
for the translation rules: 


[XSLT stylesheet]const == XQuery 


where Const denotes the translation function name. 


3.1 From template definitions to XQuery 
functions 


Template definition 


<xsl:template 

match? = pattern 
name ? = qname 
priority? = number 
mode? = tokens 

as? = sequence-type> 
<!-- Content: 
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(xsl:param*, sequence-constructor) --> 


</xsl:template> 


The constructor xsl:template defines a transformation rule 
based either on a name (when the attribute name is specified) and/or 
on a source document (when the attribute mat ch is specified). 


Translation rules 


Templates with mat ch attribute can be statically and completely or- 
dered according to their import precedence as defined in [7, §6.4] 
and their priority (either explicitly specified, or, if absent, computed 
by analysing the syntax of their match pattern as specified in [7, 
§6.4]). In the remainder of this paper, we assume that templates 
with match attribute have been sorted according to their import 
precedence and their priority. (templatei, ., templaten) 
denotes the sorted list of templates with match attributes in the in- 
put stylesheet. The translation rule of the i™ template is as follows: 


[<xsl:template match=’pattern’ priority=’ number’ 
mode=’token;...token:’ as=’type’> 
xsl:iparam,;...xsl:paramn 
sequence-constructor 
</xsl:template>] const 
declare function t2q:template; ( 
St2q:dot as node(), 
St2q:pos as xs:integer, 
St2q:last as xs:integer, 
St2q:mode as xs:token, 
[xsl:parami] const, ..-, [xSl:paramn] const) 


{ 


[sequence-constructor] const 


as type 


The information required to instantiate a template must be passed 
as parameters to the generated XQuery function. The current focus 
is specified by the parameters $dot, for the current context node, 
Spos, for the current context position, and $last, for the current 
context size. The $mode parameter indicates the mode in which the 
template is being instantiated. In XSLT, modes allow the process- 
ing of a node many times. In XSLT 1.0, the mode in which a given 
template is instantiated is always statically known, but this does no 
longer hold in XSLT 2.0 where the following is valid: 


<xsl:template match=’ slide’ mode=’ #all’> 
<xsl:apply-templates mode=’ #current’ /> 
</xsl:template> 


#al1l1 denotes all possible modes; #current denotes the cur- 
rent template mode. In general, it is no longer possible to stati- 
cally reduce the list of templates that can be applied based upon 
the mode attribute. Thus, by default, all templates need to be con- 
sidered and the current mode passed as argument of the generated 
XQuery functions corresponding to XSLT templates. 

Finally, template parameters are translated by extracting from 
their definition their name and type as follows (note that tunnel 
parameters are not supported yet, see section 4): 


[<xsl:param 


name = qname 
select? = expression 

as ? = sequence-type 

required? = "yes" "no"> 

<!-- Content: sequence-constructor --> 


</xsl:param>] const 


qname as sequence-type 


If as attribute is not specified, sequence-type is replaced by 
item() *. 

Templates that do not specify a match attribute are translated 
into XQuery functions in a similar manner, and their invocation, 
through xsl:call-template, simply corresponds to a XQuery 
function call to the appropriate generated function: 


[<xsl:template name=’ qname’ 

mode=’ tokenl tokenN’ as=’type’> 
xsl:iparam,...xsl:paramn 
sequence-constructor 
</xsl:template>] const 


declare function [.]¢ctname (S$dot as node(), 
St2q:pos as xs:integer, 
St2q:last as xs:integer, 
St2q:mode as xs:token, 
[xsl:parami]const,-.-, [xSl:paramp] const, 
St2q:impPrec as xs:integer, 


St2q:priority as xs:integer) as type 


[sequence-constructor] const 


where ’ .’ represents the template constructor being translated 
and [] sctname is a function that generates unique XQuery function 
names for a given XSLT instruction. 


rer? 


[instruction] fctname = concat instruction”, $decl_order) 


-> 


Note that the two additional parameters of the generated func- 
tion are used to explicitly passed the import precedence and prior- 
ity of the current template through a xs1:call-template. Note 
also that as explained in Section 2.3, relative XPath expressions are 
translated into equivalent expressions where the context is explicit. 


3.2 Capturing applicator logic in XQuery 

Generating an XQuery expression that corresponds to the body 
of a given XSLT templates is just the first step toward translat- 
ing XSLT stylesheets. Once the logic for selection and invoca- 
tion with the right parameter values is in place, the final XQuery 
function must be created. This section defines a set of translation 
rules that generate, for each applicator (xs1:apply-templates 
, xSl:apply-imports and xsl:next-match), an XQuery func- 
tion which explicitly captures the selection, conflict resolution and 
invocation logic. 


Definition 


The evaluation of an applicator instruction is performed in three 
steps. First a selector XPath selects a sequence of nodes; then, for 
each selected node, the template that is best matched by the node 
is selected; finally, the selected template or a built-in template, if 
no user-defined templates are matched by the selected node, is in- 
voked. The applicators only differ by the set of considered tem- 
plates in the second step. 

For simplicity of the presentation, we only focus here on one ap- 
plicator, xsl:apply-templates. In appendices B.3 and B.2, we 
briefly present minor adjustments to the translation rules to handle 
xsl:apply-imports and xsl:next-match. 

The xsl:apply-templates content model is: 


<xsl:apply-templates 

select? = expression 

mode? = token> 

<!-- Content: 

(xsl:sort | xsl:with-param) * --> 
</xsl:apply-templates> 
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Applicator translation rules 


We give the translation rules to generate an XQuery function (re- 
ferred to as an XQuery applicator function) that implements an ap- 
plicator selection, along with the proper conflict resolution and in- 
vocation logic. 

An XQuery applicator function takes as parameters the current 
context node, the current context position, the current context size, 
the current mode, and a list of p parameters of type item() * ( 
paramı, ..., paramp) where p is the number of distinct names of 
XSLT template parameters defined by templates with match at- 
tribute in the input stylesheet. Basically, a parameter name of a 
template is mapped to a position from 1 to p by the [] paramname2Pos 
function. The function [] pos2zparamName 1S the inverse function of 
[ ] ParamName2Pos:+ 

The translation rule that generates an XQuery applicator function 
takes as input the sequence of templates with mat ch attribute in the 
input stylesheet sorted according to their import precedence and 
priority. The body of an XQuery apply-templates function consists 
of a large nested if-then-else expression that selects the first tem- 
plate whose mode matches the $t2q:mode parameter and whose 
match pattern is matched by the node referenced by the $t2q: dot 
parameter: 


[(templatei, ..., 


templaten) ] 


declare function t2q:applyTemplates ( 
St2q:dot as node()?, 

St2q:pos as xs:integer, 

St2q:last as xs:integer, 

St2q:mode as xs:string, 

St2q:param; as item()*, 


aap 
$t2q:paramp as item() *) 
as item()* 


if ($t2q:mode = 
and exists ([ 
then 


[template] ] mode 
template1]toselect) ) 
[template1 ] invoke 


else if ($t2q:mode=[templaten] mode 
and exists([templaten]toselect)) ) 

then [templaten] invoke 

else 
t2q:builtInApplyTemplates ($t2q:dot,$t2q:pos, 
St2q:last, $t2q:mode, 
St2q:param,,...,$t2q:param,) 


hi 


where the rule []moae generates a sequence of tokens corre- 
sponding to modes specified by a given template. In appendix B.1, 
we formally define t2q:builtInApplyTemplates () function, 
which encodes all built-in templates. This translation rule does 
not attempt to detect nodes matching two or more templates of the 
highest import precedence and priority. This could trivially be done 
by adding additional tests. 

The []tose1ect function is used in the previous rule to translate 
the notion of match pattern, which does not exist in XQuery. The 
semantics of a match pattern is specified in XSLT 2.0 [7, 85.5.3] 
by the specification of a translation rule from a match pattern into 
a valid XPath expression. However, the evaluation of the generated 
XPath expression is inefficient because it does not simply test the 
pattern on a given candidate node, but it first evaluates an absolute 
XPath expression containing at least one descendant axis, and then 
tests whether the candidate is in the resulting sequence. Our effi- 
cient translation based on reversing pattern is illustrated in section 
2.3 and is formally described in appendix A. 

To complete our translation, we formally define the [] invoke 


rule. It generates a function call to the XQuery function corre- 
sponding to a given XSLT template. Unlike XSLT that allows op- 
tional template parameters, in XQuery all function parameters are 
mandatory. Therefore, when invoking an XQuery function, all its 
parameters must be explicitly bound. To handle XSLT optional pa- 
rameters, we must be able to 1) detect that an XSLT applicator has 
been called without explicitly specifying the value of an optional 
parameter, and 2) call the XQuery template function with the de- 
fault value of the missing parameter. The former goal is reached by 
always binding unspecified parameters (i.e. parameters not present 
in the list of xs1:with-param of the considered XSLT applicator) 
of an XQuery applicator function call to the special global variable 
$t2q:UNDEFINED. The [] invoxe translation rule, specified below, 
achieves the latter objective (the input template is assumed to be at 
the i" position in the sorted list of templates) . 


[<xsl:template name=’ qname’ 
mode=’ token; token:’ as='’type’> 
<xsl:param name=’ name’ .../> 
<xsl:param name=’ name,’ ./> 
sequence-constructor 
</xsl:template>] invoke 
(with k <= p) 
t2q:template; ( 
$t2q:dot, 
$t2q:pos, 
St2q:last, 
St2q:mode, 
typeswitch ($t2q: param name) J paraname2Pos ) 
case $t2q:a as comment () 
return 
if ($t2q:a is $t2q:UNDEFINED) 
then [xsl:param ] defaultValue 
else $t2q:param[name)] 
default 
return $t2q:param[name; 


ParamName2Pos 


ParamName2Pos / 


ROT 
typeswitch ($t2q: param name, ]paramName2Pos ) 
case $t2q:a as comment () 
return 
if (St2q:a is $t2q:UNDEFINED) 
then [xsl:param, ] defaultValue 
else $t2q:paramtnamex] 
default 
return $t2q:param,name,] 


ParamName2Pos 


ParamName2Pos ) 


where [] aefauitvalue rule generates the default value of a given 
xsl:param and $t2q:UNDEFINED is defined as: 


declare variable $t2q:UNDEFINED as comment () 
{comment { 

undeclared variable used 

for node identity test 


p; 


[<xsl:param name=’ name’ select=’ expr’ />] defaultvalue 
[expr] Expr 


3.3 Invoking XQuery applicator functions 


After formalizing the translation rules that generate XQuery ap- 
plicator functions, we are now ready to present rules that generate, 
for each instance of a XSLT applicator, a function call to the appro- 
priate XQuery applicator function. 

These rules are as follows : 


[<xsl:apply-templates select=’expr’ mode=’ mode’ > 


xsl:with-param« 
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</xsl:apply-templates>] const 
let $t2q:sequence := [expr]expr return 
let S$t2q:inner-last := count ($t2q:sequence) 
return 
for $t2q:inner-dot 
at $t2q:inner-pos in S$t2q:sequence 
return 
t2q:applyTemplates ( 
$t2q:inner-—dot, $t2q:inner-pos, 
S$t2q:inner-last, 
‘mode’, 
[1 ] ParamValue(xsl:with-params) s 
oF 


[P] paramvalue (xsl:with-param*) ) 


where [] paramvalue(xsl:with-param+) returns, for a given posi- 
tion ( recall that a position uniquely identifies a parameter name), 
its value specified in xs1:with-—paran list if it exists; otherwise it 
returns the global variable t2q: UNDEFINED. Note that if the mode 
is #current, then the translation is the same except that ‘mode’ 
is replaced by $mode. Formally, [ ] ParamValue (xsl:with-param»*) is 
defined as follows: 


[ <xsl:with-param name=’ name’ /> ]name = "name" 


[k] ParamValue (xsl:with-param*) 


if ([ xsl:with-paramı ]name = [K]pos2ParamName) 
then [ xsl:with-paramı ]const 

else 

if ([ xsl:with-paramn ]name = [Kk]position2ParamName ) 
then [ xsl:with-paramn ]const 


else t2q:UNDEFINED 


Appendices B.3 and B.2 show how these rules can be modified 
to translate template instantiations with xsl:apply-import and 
xsl:next-match 


4. INTEROPERABILITY AND ISSUES 


Our translation from XSLT to XQuery highlights the differences 
between XSLT and XQuery. In this section we summarize our ex- 
perience with some of these. 


Tunnelling parameters. XSLT 2.0 allows template parame- 
ters declared with the special tunnel=’ yes’ attribute to “pass 
through” to all templates that are called while the parameter bind- 
ing is in effect dynamically, even through templates that do not de- 
clare the parameter, including templates that are called from other 
modules [7, §10.1.2]. 

Since the collection of tunneling parameters is known statically 
for a complete stylesheet (including all the imported modules), this 
can be implemented by adding as parameters the list of the current 
value of all potentially tunneled parameters to all the functions gen- 
erated for all templates. Unfortunately the need for the complete 
stylesheet prevents separate compilation of imported stylesheets. 

An alternative is to add a single “dynamic environment” parame- 
ter to each template which contains a representation (in XML) of 
the currently bound tunneled parameters. This, however, requires 
translating parameter access into XPath expressions that select a 
fragment of the dynamic environment object. 


Dynamic sort specification. The xs1:sort instruction in 
XSLT 2.0 permits that several of the aspects of sorting are spec- 
ified using attribute value templates (AVT) computed dynamically 


at runtime. The XQuery 1.0 order by construction does not al- 
low this, and does not allow dynamic composition of sort modes. 
This means that xs1:sort has to be compiled into code that tests 
the values of the AVT attributes and at runtime branches to a for 
expression with the appropriate order by sort specification. Only 
one attribute cannot be supported in this way: setting the collation 
for sorting to a computed value cannot be translated into XQuery. 


White-space stripping. Through the xs1:strip-space and 
xsl:preserve-space declarations [7, §4.3], XSLT 2.0 allows 
declaring per element name whether white-space from the input 
document should be preserved or not. This functionality is not 
available to XQuery. 


Serialization. XSLT 2.0 gives very detailed control over how 
the generated output is serialized through the xsl:output and 
xsl:character-map declarations. Most of this serialization con- 
trol is not available in XQuery and thus these cannot be fully sup- 
ported by our translation. 

Finally, since the xs1:disable-output-escaping attribute 
from XSLT 1.0 is optional in XSLT 2.0 and has not been considered 
for the translation. 


5. EXPERIMENTAL EVALUATION 


The translation rules presented earlier have allowed the imple- 
mentation of a XSLT 2.0 compiler into XQuery. The purpose of 
this section is to describe briefly our implementation and report on 
our first experiments with that implementation. 


Implementation. Most of the translation rules have been im- 
plemented in Java. The main instructions that have not been imple- 
mented include xsl: sort, xsl: for-each-group, xsl:key and 
xsl:number. Moreover issues presented in the previous section 
haven’t been implemented. 

The compiler architecture relies on a three stage processing. Dur- 
ing the first stage, the XSLT stylesheet is parsed using a standard 
SAX parser. The second stage consists of applying translation rules 
that can be treated on-the-fly, when receiving events. This is no- 
tably the case for the xs1:value-of or xsl:if instructions. The 
instructions that cannot be treated on the fly are those which require 
the full list of templates, for instance xsl: apply-templates or 
xsl:next-match. Those instructions are kept in memory and 
translated during the third stage. 

The main advantage of such architecture is to keep the memory 
consumption very low during compilation time while being fast. 
Indeed experiments show a relatively low resources consumption 
even for relatively big stylesheets like docbook [14]. 


Experiments. We have performed two kinds of experiments: on 
conformance and on performance. The purpose of the conformance 
experiments is to evaluate the correctness of the compiler. The 
compiler has been run over the Xalan conformance testsuite [15] 
which contains 1686 XSLT 1.0 test cases (it is the largest compo- 
nent of the OASIS suite [10, 12] ). The generated queries are then 
processed using Saxon 8.0 [6]. The compiler is able to compile and 
run properly 55% of the test cases. Most of test cases fail either be- 
cause they include an unimplemented instruction (more than 25%) 
or because Saxon crashes or produces a wrong result (more than 
10%). We expect that as more mature XQuery implementations 
emerge, the number of tests passed will increase accordingly. 

The goal of the second experiment is to compare the performance 
of the evaluation of queries produced by the compiler against the 
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original XSLT stylesheet. We have run several XSLT transforma- 
tions from the Sarvega [11] benchmark and their XQuery equiv- 
alents using Saxon 8.0. One advantage of using Saxon is that it 
executes both programs using the same internal runtime. Therefore 
it allows a fair comparison in particular because optimizations will 
be applied on the same instruction set. However it is worth notic- 
ing that Saxon currently provides better optimizations for XSLT 
than for XQuery. 

The figure 2 shows a summary of the experiments made on the 
recipe and MathML transformations. Each transformation has been 
applied over different input document sizes. The figure shows that 
the XSLT transformations and the XQuery queries execute in O(n), 
n being the size of the input document. This is a very promising re- 
sult demonstrating that our compiler doesn’t change the algorithm 
complexity. It is in part due to our efficient translation of match pat- 
tern: replacing our reverse pattern approach by the naive translation 
defined in [7, §5.5.3] results in a nonlinear behavior (the perfor- 
mance degradation is such that, in the MathML example, Saxon is 
unable to process even a 10K document). However, the figure 2 also 
shows that the queries perform worse than the original stylesheet 
by a constant factor. This loss of performance is mainly located 
in the execution of the instruction xsl:apply-templates. In- 
deed, Saxon provides an aggressive optimization for looking up the 
right template to instantiate (using hashtables) whereas the gener- 
ated queries tests sequentially which template function to execute. 


6. RELATED WORK 


At the language level, XSLT 2.0 [7] and XQuery 1.0 [3] are de- 
fined in close collaboration by the W3C XSLT and XML Query 
working groups. At the infrastructure level, there has not been 
enough work on how to make both languages interoperate. To the 
best of our knowledge, SAXON [6] is the only public implementa- 
tion that supports both XQuery and XSLT. Although it is likely that 
SAXON reuses as much infrastructure to support the two languages 
as possible, there is little information available that describes how 
this is achieved. In [9], Moerkotte describes an implementation of 
XSLT on top of a database management system. That approach re- 
lies on compiling XSLT into a database algebra. This makes the 
approach more difficult to apply in the various application con- 
texts we have been considering, where looser coupling is useful. 
In addition, the way his approach covers the complete XSLT lan- 
guage, notably how the specific details of its semantics are handled, 
is not fully specified. In comparison, our approach is to rely on the 
common features between XQuery and XSLT in order to provide a 
lightweight, user-friendly, yet complete, implementation. 

From a theoretical standpoint, several papers have studied the se- 
mantics of XSLT. In [13], Wadler proposes a denotational seman- 
tics for XSLT patterns. The approach we have described here is a 
more efficient variant of the same semantics, relying on the notion 
of pattern reversal. In [8, 2], Bex, Maneth and Neven propose a 
precise semantics for a fairly large fragment of the XSLT language, 
based on tree grammars. Part of that semantics can be used to im- 
plement the template-based approach we described. However, it is 
not complete, instead trying to identify a fragment of XSLT with 
good complexity properties. 


7. CONCLUSION 


In this paper, we have presented a general method for translating 
the highly declarative rule-based approach of XSLT into the purely 
functional XQuery approach, leading the way to closer integration 
between the two languages. Our initial experiments have shown the 
feasibility of this approach by confirming that the evaluations of 
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Figure 2: Performance comparison between XSLT and XQuery (left: recipes, right: mathml) 


an XSLT transformation and its generated XQuery share the same 
algorithmic complexity. 

However, a naive evaluation of the generated XQuery exhibits a 
performance penalty of up to factor 7 compared to the initial XSLT 
transform. Addressing this performance degradation constitutes an 
interesting challenge. We plan to investigate how a combination 
of context sensitive flow analysis and function specialization can 
be applied on the generated XQuery to statically reduce the list of 
considered templates in the if-then-else expression in the XQuery 
applicator functions. 

This work has highlighted some important differences between 
the two languages. Most of them seem justified by the different 
design rationales, but we could not find reasonable grounds to ac- 
count for the following differences, and, therefore, we suggest that 
the two working groups adopt a common solution: 


e difference in white-space processing. Unlike XQuery, XSLT 
allows a finer control on white space processing, 


e although the two languages share the same serialization spec- 
ification, unlike XSLT, XQuery does not define a processor 
independent mechanism to specify serialization attributes. 
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APPENDIX 
A. MATCH PATTERN 


I] toselect performs the translation of a pattern into the expression 
for which an existence test will be performed (by fn:exist()). The 
first step in the translation consists of obtaining the Equivalent Ex- 
pression (EE) defined in XSLT 2.0 [7, 85.5.3]. The EE is an XPath 
expression whose first step may have attribute-or-top and child-or- 
top as axis. The translation rule of an EE is as follows (EPS denotes 
a step in the EE defined in [7, §5.5.3]) : 


[EPS] toselect = St2q:dot/(.) [ [EPS] match] 
[EPSo/.../EPSnltoselect = $t2q:dot/(.) [[EPSn] matcn] [ 
[[EPSn] axis] inv: :node() [[EPSn-1] match] / 
..-/[TEPSi]axis]inv: :node() [[EPSo] match] ] 


axis::m[P] ]axis=axis 

child] inv=parent 

descendant-—or-self] inyv=ancestor-or-self 
attribute] jiny=parent 

self]inv=self 

axis::m[P] ]match=([axiS]iny::node()/axis::m[P]=.) 
child-or-top::m[P]]match =if (parent: :node) 

then (parent::node()/child::m[P]=.) 

else self::m[P] [not(. instance of attribute())] 
attribute-or-top::m[P]]matcn=if (parent: :node) 
then (parent::node()/attribute::m[P] = .) 

else ((. instance of attribute()) and [m[P] ]test) 
id(value) Jmatch=(id(value) = .) 

root (self: :node() ) Jmatch= (root (self::node())=.) 
key (name, value) Jmatch=([key (name, value) ]const=-) 
KinTest [P] ]test= ( instance of KindTest) 


and exist ((.) [P]) 

ncn: [P] ]test=(namespace-uri(.) eq ns) 
and exist((.)[P]) - where ncn resolve to ns 
*:localName[P] Jtest=(local-name(.) eq localName) 
and exist ((.) [P]) 


ncn: localName[P] ]test=(namespace-uri(.) eq ns) 


and (local-name(.) eq localName) 
and exist((.)[P]) - where ncn resolve to ns 
*[P]ltest=exist ((.) [P]) 


When the pattern is a union of patterns, then the translation is the 
union of the translated patterns. 


B. TEMPLATE INSTANTIATION 
B.1 Built-in templates 


The following XQuery function captures the semantics of select- 
ing and invoking built-in templates. 


declare function t2q:builtInApplyTemplates ( 
St2q:dot as node()?, 
St2q:pos as xs:integer, 
St2q:last as xs:integer, 
St2q:mode as xs:string, 
St2q:param; as item()*,..., 
$t2q:paramp as item() *) 
as item()* 
{ 
if (exists ($t2q:dot/self::text () 
|$t2q:dot/(.)[. instance of attribute()]) 
then string-—join( 
for $t2q:d in data($t2q:dot) 
return ($t2q:d cast as xs:string),’ ') 
else if (exists (S$t2q:dot/self::comment () 
| $t2q:dot/self::processing-instruction())) 
then () 
else 
let $t2q:sequence := $t2q:dot/node() return 
let S$t2q:inner-last := count ($t2q:sequence) 
return 
for $t2q:inner-dot 
at $t2q:inner-pos in $t2q:sequence return 
t2q:applyTemplates (S$t2q:inner-dot, 
St2q:inner-pos, 
St2q:inner-last, $t2q:mode, 
St2q:param,,...,$t2q:param,) 


F 


B.2 xsl:next-match instruction 
The translation rule generating the XQuery applicator function 
corresponding to xsl:next-match is defined as follows: 


[(templatei, ..., 


templaten) ] 


declare function t2q:applyNextMatch ( 
St2q:dot as node()?, 
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$t2q: 
$t2q: 
$t2q: 
$t2q: 
$t2q: 
$t2q: 
$t2q: 
) 

as item()* 
{ 

LE -i 
and 
and 
and 

then 


pos as xs:integer, 

last as xs:integer, 
mode as xs:string, 
paramı as item()*, ..., 
param, as item()*, 
impPrec as xs:integer, 
priority as xs:double) 


[templatei] priority < $t2q:priority 
[templateilimpprec < $t2q:impPrec 
$t2q:mode = [template] ] mode 
exists ([templatei ]toselect) ) 
[template ] invoke 
else 
and 
and 
and 
then 
else 
t2q:builtInApplyTemplates ($t2q:dot,$t2q:pos, 
St2q:last, $t2q:mode, 
$t2q:parami,...,$t2q:paramp) 


if ( [templaten ] priority < $t2q:priority 
[templaten]impprec < $t2q:impPrec 
$t2q:mode=[templatey] mode 

exists ([templaten]toselect)) ) 
[templaten 


invoke 


where []priority generates the priority of a template and []impprec 
its import precedence. The additional parameters ($t2q:impPrec 
and $t2q:priority indicate the import precedence and priority 
of the current template. They are used in body of the generated 
function to restrict the set of considered templates. 

The translation rule for each xsl:next-match instruction is as fol- 
lows: 


[<xsl:next-match select=’expr’ mode=’mode’ > 
xsl:with-param« 
</xsl:next-match>] const 
let $t2q:sequence := [expr]expr return 
let S$t2q:inner-last := count ($t2q:sequence) 
return 
for $t2q:inner-dot 
at $t2q:inner-pos in $t2q:sequence 
return 
t2q:applyNextMatch ( 
$t2q:inner-dot, $t2q:inner-pos, 
$t2q:inner-last, 
‘mode’, 
[1 ] ParamValue(xsl:with-param*)r--e+r7 
[p] ParamValue (xsl:with-param») r 
[ . ] currentTemplateImpPrecry 


[.] currentTemplatePriority) 


where [current TemplatelmpPrec (resp. [. ]currentTemplatePriority ) generates the 
import precendence (resp. the priority) of the current template 
at the location of the invocation of xsl:next-match if it is stati- 
cally known; otherwise (e.g. if xsl:next-match is invoked inside 
a named template), it generates the variable $t2q:impPrec (resp. 
$t2q:priority), which is always used to indicate the import prece- 
dence (resp. the priority) of the current template. 


B.3 _ xsl:apply-imports instruction 


The translation of xs1:apply-imports follows the same prin- 
ciples presented for xsl:next-match. In addition to parame- 
ters of $t2q:applyTemplates, the XQuery applicator function 
implementing xsl:apply-imports logic specifies a parameter 
that indicates the import path of the stylesheet module where the 
xsl:apply-import is invoked. This is used to restrict the list of 
considered templates. 


